Home:ALL Converter>Strange behaviour on timestamp data type in Hadoop SQL

Strange behaviour on timestamp data type in Hadoop SQL

Ask Time:2021-12-16T21:29:07         Author:Ken Masters

Json Formatter

I am trying to get the records for the condition as below in a Hadoop database:

select
CUSTOMER_SITE_NBR as account_site_nbr,
SITE_USE_ID as account_site_use_id,
CREATION_DATE_TIME as create_date_time,
LAST_UPDATE_DATE_TIME as main_source_last_update_date_time
from hub_customer.dim_site_use_mdm  
where cast (CREATION_DATE_TIME as date)  
BETWEEN '2020-02-01' and '2020-02-29' and  cast (LAST_UPDATE_DATE_TIME as date) = '2020-02-28' and
 site_use_code <> 'HEADQUARTER' order by
account_site_nbr,
account_site_use_id,
create_date_time,
main_source_last_update_date_time;

The records returned as below:

enter image description here

As you see, the main_source_last_update_date_time column returns all the time part in timestamps as 00:00:00. The data in our database rarely has 00:00:00 in timestamp.

I tested for another two cases:

Case 1: This gave incorrect result

select
CUSTOMER_SITE_NBR as account_site_nbr,
SITE_USE_ID as account_site_use_id,
CREATION_DATE_TIME as create_date_time,
LAST_UPDATE_DATE_TIME as main_source_last_update_date_time
from hub_customer.dim_site_use_mdm  
where cast (CREATION_DATE_TIME as date)  
BETWEEN '2020-02-01' and '2020-02-29' and  cast (LAST_UPDATE_DATE_TIME as date) = '2020-02-28' and
 site_use_code <> 'HEADQUARTER' AND SITE_USE_ID = '100000010853754' order by
account_site_nbr,
account_site_use_id,
create_date_time,
main_source_last_update_date_time;

enter image description here

Case 2:

select
CUSTOMER_SITE_NBR as account_site_nbr,
SITE_USE_ID as account_site_use_id,
CREATION_DATE_TIME as create_date_time,
LAST_UPDATE_DATE_TIME as main_source_last_update_date_time
from hub_customer.dim_site_use_mdm  where
SITE_USE_ID = '100000010853754'

enter image description here

The correct data is in second case. There was no CAST in the SELECT statements. It seems like the main_source_last_update_date_time column got converted to DATE and then being converted back to timestamp - therefore, it might gave the 00:00:00 in the record. The issue occurs only this table as we have other tables with similar SQL queries and they provided corrected results.

How can find the cause of this issue and what is the correct approach to fix this ?

Kind regards,

Author:Ken Masters,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/70379852/strange-behaviour-on-timestamp-data-type-in-hadoop-sql
yy